[navigation graph info]

[relation_subgraph_info]

[vlm_global_analysis]

[evidence_memory_info]

---
Your last instruction: [llm_instruction]
VLM's response: [vlm_response]

---
The above information is the video information collected by VLM so far, and there may still be missing information and errors. You need to keep gathering more information until you can answer the question.
If you need more information, please provide the next instruction to the VLM, including: the time period to focus on next, the information that VLM needs to obtain from that period.
You can specify that information should be fetched from the "full video" or from a certain period of the video. "Full video" is usually chosen only when it is uncertain from which time period to obtain information.
You should only give a simple and short instruction or question at a time, including recognition, localization, judgment, and description of basic scenes, objects and actions. 
You should conduct the analysis first:
Analysis: The analysis of the current situation and info. Determine whether the existing information can lead to an answer. Otherwise, you should give an instruction for missing info.

Then give the instruction or answer in json form:
Your instructions should follow the following format:
```json
{
    "period": "full video / hh:mm:ss-hh:mm:ss",
    "instruction": "A one-sentence question to VLM"
}
```
Only if current information is sufficient to arrive at the answer, give the final answer.
The answer format is as follows:
```json
{
    "reason": "The detailed reasoning process and rationales. Indicate the corresponding time period if necessary.",
    "answer": "Answer to the question."
}
```
